options(width = 100)

The chapter opens with the Ptolemaic geocentric model of the solar system. His model was extremely accurate in its predicted locations of planets for stargazing and naval navigation.

McElreath describes linear regression as * "a family of simple statistical golems that attempt to learn about the mean and variance of some measurement, using an additive combination of other measurements" (p. 71). * "a descriptive model that corresponds to many different process models"

4.1. Why normal distributions are normal

4.1.1. Normal by addition.

In this document, we'll use the tidyverse for data wrangling.

library(dplyr)

Moving forward, here's a way to do the simulation necessary for the plot in the top panel of Figure 4.2.

# We set the seed to make the results of runif() reproducible. You can put any old number you like in set.seed(), but to exactly reproduce my results, you'll have to put in 1000.
set.seed(1000)
pos <- 
  replicate(100, runif(16, -1, 1)) %>%        # Here's the simulation
  as_tibble() %>%                             # For data manipulation, we'll make this a tibble
  rbind(0, .) %>%                             # Here we add a row of zeros above the simulation results
  mutate(step = 0:16) %>%                     # This adds our step intex
  gather(key, value, -step) %>%               # Here we convert the data to the long format
  mutate(person = rep(1:100, each = 17)) %>%  # This adds a person id index
  # The next two lines allows us to make culmulative sums within each person
  group_by(person) %>%
  mutate(position = cumsum(value)) %>%
  ungroup()  # Ungrouping allows for further data manipulation

And here's the actual plot code.

ggplot(data = pos, 
       aes(x = step, y = position, group = person)) +
  geom_vline(xintercept = c(4, 8, 16), linetype = 2) +
  geom_line(aes(color = person < 2, alpha  = person < 2)) +
  scale_color_manual(values = c("skyblue4", "black")) +
  scale_alpha_manual(values = c(1/5, 1)) +
  scale_x_continuous("step number", breaks = c(0, 4, 8, 12, 16)) +
  theme(legend.position = "none")

Here's the code for the bottom three plots of Figure 4.2.

# Figure 4.2.a.
pos %>%
  filter(step == 4) %>%
  ggplot(aes(x = position)) +
  geom_line(stat = "density", color = "dodgerblue1") +
  coord_cartesian(xlim = -6:6) +
  labs(title = "4 steps")

# Figure 4.2.b.
pos %>%
  filter(step == 8) %>%
  ggplot(aes(x = position)) +
  geom_density(color = "dodgerblue2") +
  coord_cartesian(xlim = -6:6) +
  labs(title = "8 steps")

# An intermediary step to get an SD value
pos %>%
  filter(step == 16) %>%
  summarise(sd = sd(position))

# Figure 4.2.c.
pos %>%
  filter(step == 16) %>%
  ggplot(aes(x = position)) +
  geom_line(data = tibble(position = seq(from = -7, to = 7, by = .1)),
            aes(x = position, y = dnorm(position, 0, 2.381768)),
            linetype = 2) +  # 2.381768 came from the previous code block
  geom_density(color = "transparent", fill = "dodgerblue3", alpha = 1/2) +
  coord_cartesian(xlim = -6:6) +
  labs(title = "16 steps",
       y = "density")

While we were at it, we explored a few ways to express densities. The main action was with the geom_line() and geom_density() functions.

4.1.2. Normal by multiplication.

The product of 12 small deviations

prod(1 + runif(12, 0, 0.1))
set.seed(.1)
growth <- 
  replicate(10000, prod(1 + runif(12, 0, 0.1))) %>%
  as_tibble()

ggplot(data = growth, aes(x = value)) +
  geom_density()
set.seed(1.7195451)
big <- 
  replicate(10000, prod(1 + runif(12, 0, 0.5))) %>%
  as_tibble() 

set.seed(1.7195451)
small <- 
  replicate(10000, prod(1 + runif(12, 0, 0.01))) %>%
  as_tibble()

ggplot(data = big) +
  geom_density(aes(x = value), fill = "black") +
  labs(title = "My big density")

ggplot(data = small) +
  geom_density(aes(x = value), fill = "black") +
  labs(title = "My small density")

A plot to remind me that the distribution is of the frequency of products.

big %>% 
  ggplot(aes(x = value)) + 
  geom_histogram()

4.1.3. Normal by log-multiplication.

Test case of what happens when you create a distribution of large products: Not a normal distribution

bigger <- 
  replicate(10000, prod(1 + runif(12, 0, 200))) %>%
  as_tibble() 

bigger %>% 
  ggplot(aes(x = value)) + 
  geom_density(fill = "black") +
  labs(title = "My bigger non-normal density")
set.seed(12)
replicate(10000, log(prod(1 + runif(12, 0, 0.5)))) %>%
  as_tibble() %>%
  ggplot(aes(x = value)) +
  geom_density(color = "transparent", 
               fill = "gray33")

4.3. A Gaussian model of height

4.3.1. The data.

Let's get the data from McElreath's rethinking package.

library(rethinking)
data(Howell1)
d <- Howell1

Here we open our main statistical package, Bürkner's brms. But before we do, we'll need to detach the rethinking package. R will not allow users to use a function from one package that shares the same name as a different function from another package if both packages are open at the same time. The rethinking and brms packages are designed for similar purposes and, unsurprisingly, overlap in the names of their functions. To prevent problems, we will always make sure rethinking is detached before using brms. To learn more on the topic, see this R-bloggers blog.

rm(Howell1)
detach(package:rethinking, unload = T)
library(brms)

Go ahead and investigate the data with str(), the tidyverse analogue for which is glimpse().

str(d)

Here are the height values.

d %>%
  select(height) %>%
  head()

We can make an adults-only data frame like so.

d2 <- 
  d %>%
  filter(age >= 18)

4.3.2. The model.

Here's the shape of the prior for $\mu$ in N(178, 20)

ggplot(data = tibble(x = seq(from = 100, to = 250, by = .1)), 
       aes(x = x, y = dnorm(x, mean = 178, sd = 20))) +
  geom_line() +
  ylab("density")

And here's the ggplot2 code for our prior for $\sigma$, a uniform distribution with a minimum value of 0 and a maximum value of 50. We don't really need the y axis when looking at the shapes of a density, so we'll just remove it.

tibble(x = seq(from = -10, to = 60, by = 1)) %>%
  ggplot(aes(x = x, y = dunif(x, min = 0, max = 50))) +
  geom_line() +
  scale_y_continuous(NULL, breaks = NULL) +
  theme(panel.grid = element_blank())

Here we simulate from those two priors to get a prior probability distribution of heights.

sample_mu <- rnorm(1e4, 178, 20)
sample_sigma <- runif(1e4, 0, 50)

tibble(means = rnorm(1e4, sample_mu, sample_sigma)) %>%
  ggplot(aes(x = means)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  theme(panel.grid = element_blank()) + 
  labs(title = "The expected distribution of heights averaged over the prior")

As McElreath wrote, we've made a "vaguely bell-shaped density with thick tails" (p. 83).

4.3.3. Grid approximation of the posterior distribution.

I'm not going to touch this. As McElreath explained, you'll never use this for practical data analysis. Rather, let's pick up right with McElrath's map() models.

But I want to see it.

## R code 4.14
# Create grid of all mu and sigma pairings
mu.list <- seq( from=140, to=160 , length.out=200 )
sigma.list <- seq( from=4 , to=9 , length.out=200 )
post <- expand.grid( mu=mu.list , sigma=sigma.list )

post$LL <- sapply( 1:nrow(post) , function(i) sum( dnorm(
                d2$height ,
                mean=post$mu[i] ,
                sd=post$sigma[i] ,
                log=TRUE ) ) )

post$prod <- post$LL + dnorm( post$mu , 178 , 20 , TRUE ) +
    dunif( post$sigma , 0 , 50 , TRUE )
post$prob <- exp( post$prod - max(post$prod) )

## contour plot to inspect the posterior
rethinking::contour_xyz( post$mu , post$sigma , post$prob )

## heatmap to inspect the posterior
rethinking::image_xyz( post$mu , post$sigma , post$prob )

Sample combinations of parameters by randomly sampling row numbers in post in proportion to the values in post$prob.

## R code 4.17
sample.rows <- sample( 1:nrow(post) , size=1e4 , replace=TRUE ,
    prob=post$prob )
sample.mu <- post$mu[ sample.rows ]
sample.sigma <- post$sigma[ sample.rows ]

## R code 4.18
plot( sample.mu , sample.sigma , cex=0.5 , pch=16 , col=rethinking::col.alpha(rethinking::rangi2,0.1) )
## R code 4.19
rethinking::dens( sample.mu )
rethinking::dens( sample.sigma )

## R code 4.20
rethinking::HPDI( sample.mu )
rethinking::HPDI( sample.sigma )

sigma has a longer right tail because SDs are always positive.

## R code 4.21
d3 <- sample( d2$height , size=20 )

## R code 4.22
mu.list <- seq( from=150, to=170 , length.out=200 )
sigma.list <- seq( from=4 , to=20 , length.out=200 )
post2 <- expand.grid( mu=mu.list , sigma=sigma.list )
post2$LL <- sapply( 1:nrow(post2) , function(i)
    sum( dnorm( d3 , mean=post2$mu[i] , sd=post2$sigma[i] ,
    log=TRUE ) ) )
post2$prod <- post2$LL + dnorm( post2$mu , 178 , 20 , TRUE ) +
    dunif( post2$sigma , 0 , 50 , TRUE )
post2$prob <- exp( post2$prod - max(post2$prod) )
sample2.rows <- sample( 1:nrow(post2) , size=1e4 , replace=TRUE ,
    prob=post2$prob )
sample2.mu <- post2$mu[ sample2.rows ]
sample2.sigma <- post2$sigma[ sample2.rows ]
plot( sample2.mu , sample2.sigma , cex=0.5 ,
    col=rethinking::col.alpha(rethinking::rangi2,0.1) ,
    xlab="mu" , ylab="sigma" , pch=16 )

## R code 4.23
rethinking::dens( sample2.sigma , norm.comp=TRUE )

4.3.5. Fitting the model with ~~map()~~ brm().

We won't actually use map(), but will jump straight to the primary brms function, brm(). In the text, McElreath indexes his models with names like m4.1. I will largely follow that convention, but will replace the m with a b to stand for the brms package. Here's the first model.

b4.1 <- 
  brm(data = d2, family = gaussian,
      height ~ 1,
      prior = c(set_prior("normal(178, 20)", class = "Intercept"),
                set_prior("uniform(0, 50)", class = "sigma")),
      chains = 4, iter = 31000, warmup = 30000, cores = 4)

Note the warning message. Stan, which is the engine under the hood of our brms vehicle, gets concerned when you use priors with hard bounds on parameters for which we wouldn't expect hard bounds. There's no reason to expect a hard upper bound on $\sigma$, so Stan barks. Beware of unneeded hard bounds.

McElreath's uniform prior for $\sigma$ was rough on brms. It took an unusually-large number of warmup iterations before the chains sampled properly. As McElreath covers in chapter 8, HMC tends to work better when you default to a half Cauchy for $\sigma$. Here's how to do so.

b4.1halfCaucy <- 
  brm(data = d2, family = gaussian,
      height ~ 1,
      prior = c(set_prior("normal(178, 20)", class = "Intercept"),
                set_prior("cauchy(0, 1)", class = "sigma")),
      chains = 4, iter = 2000, warmup = 1000, cores = 4)

This leads to an important point. After running an HMC model, it's a good idea to inspect the chains. McElreath covers this in chapter 8. Here's a typical way to do so in brms.

plot(b4.1halfCaucy)

If you want detailed diagnostics for the HMC chains, call launch_shiny(b4.1). That'll keep you busy for a while. But anyway, the chains look good. We can reasonably trust the results

Here's how to get the model summary, the brms equivalent to rethinking's precis().

print(b4.1halfCaucy)

The summary() function works in a similar way.

You can also get a Stan-like summary with this:

b4.1halfCaucy$fit

Whereas rethinking defaults to 89% intervals, using print() or summary() with brms models defaults to 95% intervals. Unless otherwise specified, I will stick with 95% intervals throughout. However, if you really want those 89% intervals, an easy way is with the prob argument within brms::summary() or brms::print().

summary(b4.1halfCaucy, prob = .89)

Anyways, here's the shockingly-narrow-$\mu$-prior model.

b4.2 <- 
  brm(data = d2, family = gaussian,
      height ~ 1,
      prior = c(set_prior("normal(178, .1)", class = "Intercept"),
                set_prior("uniform(0, 50)", class = "sigma")),
      chains = 4, iter = 3000, warmup = 2000, cores = 4)

plot(b4.2)
print(b4.2)

I had to increase the warmup due to convergence issues. After doing so, everything looks to be on the up and up.

4.3.6. Sampling from a ~~map()~~ brm() fit.

brms doesn't seem to have a convenience function that works the way vcov() does for rethinking. For example:

vcov(b4.1halfCaucy)

This only returns the first element in the matrix it did for rethinking.

However, if you really wanted this information, you could get it after putting the HMC chains in a data frame.

post <- posterior_samples(b4.1halfCaucy)
cov(post[, 1:2])

That was "(1) a vector of variances for the parameters and (2) a correlation matrix" for them (p. 90). Here are just the variances (i.e., the diagonal elements) and the correlation matrix.

post[, 1:2] %>%
  cov() %>%
  diag()

post %>%
select(b_Intercept, sigma) %>%
  cor()

With our post <- posterior_samples(b4.1halfCaucy) code, a few lines above, we've already done the brms version of what McElreath did with extract.samples(). However, what happened under the hood was different. Whereas rethinking used the mvnorm() function from the MASS package, in brms we just extracted the iterations of the HMC chains and put them in a data frame.

head(post)

Notice how our data frame, post, includes a third vector, lp__, which is the log posterior. See the brms manual for details.

The summary() function doesn't work for brms posterior data frames quite the way precis() does for posterior data frames from the rethinking package.

summary(post[, 1:2])

Here's one option using the transpose of a quantile() call nested within apply(), which is a very general function you can learn more about here or here.

t(apply(post[, 1:2], 2, quantile, probs = c(.5, .025, .75)))

The base R code is compact, but somewhat opaque. Here's how to do something similar with more explicit tidyverse code.

post %>%
  select(-lp__) %>% 
  gather(parameter) %>%
  group_by(parameter) %>%
  summarise(mean = mean(value),
            SD   = sd(value),
            `2.5_percentile`  = quantile(value, probs = .025),
            `97.5_percentile` = quantile(value, probs = .975)) %>%
  mutate_if(is.numeric, round, digits = 2)

With respect to McElreath's "Overthinking: Getting $\sigma$ right.", there's no need to fret about this in brms. With HMC, we are not constraining the posteriors to the multivariate normal distribution. Here's our posterior density for $\sigma$.

ggplot(data = post, 
       aes(x = sigma)) +
  geom_density(size = 1/10, fill = "black") +
  scale_y_continuous(NULL, breaks = NULL) +
  xlab(expression(sigma)) +
  theme(panel.grid = element_blank())

See? HMC handled the mild skew just fine.

4.4. Adding a predictor

Here's our scatter plot of weight and height.

ggplot(data = d2, 
       aes(x = weight, y = height)) +
  theme_bw() +
  geom_point(shape = 1, size = 2) +
  theme(panel.grid = element_blank())

"The linear model strategy. The strategy is to make the parameter for the mean of a Gaussian distribution, µ, into a linear function of the predictor variable and other, new parameters that we invent."

4.4.2. Fitting the model.

b4.3 <- 
  brm(data = d2, family = gaussian,
      height ~ 1 + weight,
      prior = c(set_prior("normal(156, 100)", class = "Intercept"),
                set_prior("normal(0, 10)", class = "b"),
                set_prior("uniform(0, 50)", class = "sigma")),
      chains = 4, iter = 41000, warmup = 40000, cores = 4)

plot(b4.3)
print(b4.3)

This was another example of how putting a uniform prior on $\sigma$ required an unusually large number of warmup iterations before the HMC chains converged on the posterior. Change the prior to "cauchy(0, 1)" and they converge with no problem and have much better effective samples, too.

Again, brms doesn't have a convenient corr = TRUE argument for plot() or summary(). But you can get that information after putting the chains in a data frame.

posterior_samples(b4.3) %>%
  select(-lp__) %>%
  cor() %>%
  round(digits = 2)

With centering, we can reduce the correlations among the parameters.

d2 <- 
  d2 %>%
  mutate(weight.c = weight - mean(weight))

The weight.c model:

b4.4 <- 
  brm(data = d2, family = gaussian,
      height ~ 1 + weight.c,
      prior = c(set_prior("normal(178, 100)", class = "Intercept"),
                set_prior("normal(0, 10)", class = "b"),
                set_prior("uniform(0, 50)", class = "sigma")),
      chains = 4, iter = 41000, warmup = 40000, cores = 4,
      control = list(adapt_delta = 0.8, 
                     max_treedepth = 10))

plot(b4.4)
print(b4.4)

Like before, the uniform prior required extensive warmup iterations to produce a good posterior. This is easily fixed using a half Cauchy prior, instead. Anyways, the effective samples improved. Here's the parameter correlation info.

posterior_samples(b4.4) %>%
  select(-lp__) %>%
  cor() %>%
  round(digits = 2)

See? Now all the correlations are quite low. Also, if you prefer a visual approach, you might do pairs(b4.4).

4.4.3.2. Plotting posterior inference against the data.

Here is the code for Figure 4.4. Note our use of the fixef() function.

d2 %>%
  ggplot(aes(x = weight, y = height)) +
  theme_bw() +
  geom_abline(intercept = fixef(b4.3)[1], 
              slope = fixef(b4.3)[2]) +
  geom_point(shape = 1, size = 2, color = "royalblue") +
  theme(panel.grid = element_blank())
post <- posterior_samples(b4.3)

post %>% slice(1:5)  

Here are the four models leading up to McElreaths Figure 4.5. To reduce my computation time, I used a half Cauchy(0, 1) prior on $\sigma$. If you are willing to wait for the warmups, switching that out for McElreath's uniform prior should work fine as well.

N <- 10
dN <- 
  d2 %>%
  slice(1:N)

b10 <- 
  brm(data = dN, family = "gaussian",
      height ~ 1 + weight,
      prior = c(set_prior("normal(178, 100)", class = "Intercept"),
                set_prior("normal(0, 10)", class = "b"),
                set_prior("cauchy(0, 1)", class = "sigma")),
      chains = 4, iter = 2000, warmup = 1000, cores = 4)

N <- 50
dN <- 
  d2 %>%
  slice(1:N)

b50 <- 
  brm(data = dN, family = "gaussian",
      height ~ 1 + weight,
      prior = c(set_prior("normal(178, 100)", class = "Intercept"),
                set_prior("normal(0, 10)", class = "b"),
                set_prior("cauchy(0, 1)", class = "sigma")),
      chains = 4, iter = 2000, warmup = 1000, cores = 4)

N <- 150
dN <- 
  d2 %>%
  slice(1:N)

b150 <- 
  brm(data = dN, family = "gaussian",
      height ~ 1 + weight,
      prior = c(set_prior("normal(178, 100)", class = "Intercept"),
                set_prior("normal(0, 10)", class = "b"),
                set_prior("cauchy(0, 1)", class = "sigma")),
      chains = 4, iter = 2000, warmup = 1000, cores = 4)

N <- 352
dN <- 
  d2 %>%
  slice(1:N)

b352 <- 
  brm(data = dN, family = "gaussian",
      height ~ 1 + weight,
      prior = c(set_prior("normal(178, 100)", class = "Intercept"),
                set_prior("normal(0, 10)", class = "b"),
                set_prior("cauchy(0, 1)", class = "sigma")),
      chains = 4, iter = 2000, warmup = 1000, cores = 4)

I'm not going to clutter up the document with all the trace plots and coefficient summaries from these four models. But here's how to get that information.

plot(b10)
print(b10)

plot(b50)
print(b50)

plot(b150)
print(b150)

plot(b352)
print(b352)

We'll need to put the chains of each model into data frames.

post10  <- posterior_samples(b10)
post50  <- posterior_samples(b50)
post150 <- posterior_samples(b150)
post352 <- posterior_samples(b352)

Here is the code for the four individual plots.

p10 <- 
  ggplot(data =  d2[1:10 , ], 
         aes(x = weight, y = height)) +
  theme_bw() +
  geom_abline(intercept = post10[1:20, 1], slope = post10[1:20, 2],
              size = 1/3, alpha = .3) +
  geom_point(shape = 1, size = 2, color = "royalblue") +
  coord_cartesian(xlim = quantile(d2$weight, c(0, 1)),
                  ylim = quantile(d2$height, c(0, 1))) +
  labs(subtitle = "N = 10") +
  theme(panel.grid = element_blank())

p50 <-
  ggplot(data =  d2[1:50 , ], 
         aes(x = weight, y = height)) +
  theme_bw() +
  geom_abline(intercept = post50[1:20, 1], slope = post50[1:20, 2],
              size = 1/3, alpha = .3) +
  geom_point(shape = 1, size = 2, color = "royalblue") +
  coord_cartesian(xlim = quantile(d2$weight, c(0, 1)),
                  ylim = quantile(d2$height, c(0, 1))) +
  labs(subtitle = "N = 50") +
  theme(panel.grid = element_blank())

p150 <-
  ggplot(data =  d2[1:150 , ], 
         aes(x = weight, y = height)) +
  theme_bw() +
  geom_abline(intercept = post150[1:20, 1], slope = post150[1:20, 2],
              size = 1/3, alpha = .3) +
  geom_point(shape = 1, size = 2, color = "royalblue") +
  coord_cartesian(xlim = quantile(d2$weight, c(0, 1)),
                  ylim = quantile(d2$height, c(0, 1))) +
  labs(subtitle = "N = 150") +
  theme(panel.grid = element_blank())

p352 <- 
  ggplot(data =  d2[1:352 , ], 
         aes(x = weight, y = height)) +
  theme_bw() +
  geom_abline(intercept = post352[1:20, 1], slope = post352[1:20, 2],
              size = 1/3, alpha = .3) +
  geom_point(shape = 1, size = 2, color = "royalblue") +
  coord_cartesian(xlim = quantile(d2$weight, c(0, 1)),
                  ylim = quantile(d2$height, c(0, 1))) +
  labs(subtitle = "N = 352") +
  theme(panel.grid = element_blank())

Note how we used the good old bracket syntax (e.g., d2[1:10 , ]) to index rows from our d2 data. With tidyverse-style syntax, we could have done slice(d2, 1:10) or d2 %>% slice(1:10) instead.

Anyway, we saved each of these plots as objects. With a little help of the multiplot() function we are going to arrange those plot objects into a grid in order to reproduce Figure 4.5.

Behold the code for the multiplot() function:

multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
  library(grid)

  # Make a list from the ... arguments and plotlist
  plots <- c(list(...), plotlist)

  numPlots = length(plots)

  # If layout is NULL, then use 'cols' to determine layout
  if (is.null(layout)) {
    # Make the panel
    # ncol: Number of columns of plots
    # nrow: Number of rows needed, calculated from # of cols
    layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                     ncol = cols, nrow = ceiling(numPlots/cols))
  }

  if (numPlots==1) {
    print(plots[[1]])

  } else {
    # Set up the page
    grid.newpage()
    pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))

    # Make each plot, in the correct location
    for (i in 1:numPlots) {
      # Get the i,j matrix positions of the regions that contain this subplot
      matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))

      print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
                                      layout.pos.col = matchidx$col))
    }
  }
}

We're finally ready to use multiplot() to make Figure 4.5.

multiplot(p10, p150, p50, p352, cols = 2)

4.4.3.4. Plotting regression intervals and contours

Remember, if you want to plot McElreath's mu_at_50 with ggplot2, you'll need to save it as a data frame or a tibble.

mu_at_50 <- 
  tibble(mu_at_50 = post[, 1] + post[, 2]*50)

And here is a version of the density plot of Figure 4.6.

mu_at_50 %>%
  ggplot(aes(x = mu_at_50)) +
  theme_classic() +
  geom_density(size = 0, fill = "royalblue") +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(x = expression(paste(mu, " at 50")))

brms doesn't appear to have a HPDI convenience function at this time. However, you can get them with the help of the HDInterval package.

library(HDInterval)

hdi(mu_at_50[,1], credMass = .89)
hdi(mu_at_50[,1], credMass = .95)

Bonus: If you wanted to express those sweet 95% HPDIs on your density plot, you might do it like this.

mu_at_50 %>%
  ggplot(aes(x = mu_at_50)) +
  theme_classic() +
  geom_density(size = 0, fill = "royalblue") +
  geom_vline(xintercept = hdi(mu_at_50[,1], credMass = .95)[1:2],
             color = "royalblue4", linetype = 2) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(x = expression(paste(mu, " at 50")))

In brms, you would use fitted() to do what McElreath accomplished with link(). fitted() seems to simulate a 4e3 case sample for each value of weight in the data.

mu <- fitted(b4.3, summary = F)

str(mu)
as.data.frame(mu)

When you specify summary = F, fitted() returns a matrix of values with as many rows as there were post-warmup iterations across your HMC chains and as many columns as there were cases in your analysis. Because we had 4000 post-warmup iterations and n = 352, fitted() returned a matrix of 4000 rows and 352 vectors. If you omitted the summary = F argument, the default is TRUE and fitted() will return summary information instead.

Much like rethinking's link(), fitted() can accommodate custom predictor values with its newdata argument.

weight.seq <- tibble(weight = seq(from = 25, to = 70, by = 1))

mu <-
  fitted(b4.3,
         summary = F,
         newdata = weight.seq) %>%
  as_tibble() %>%
  mutate(Iter = 1:4000) %>%
  select(Iter, everything())

str(mu)

Anticipating ggplot2, we just went ahead and put this in a data frame. But we might do a little more data processing with the aid of the gather() function. With gather(), we'll convert the data from the wide format to the long format. If you're new to the distinction between wide and long data, you can learn more here or here.

mu <- 
  mu %>%
  gather(key, value, V1:V46)

# We might reformat key, our weight index, to numerals

mu <- 
  mu %>%
  mutate(key = str_extract(key, "\\d+") %>% as.integer())
# Learn more about `str_extract()` here: http://www.fabianheld.com/stringr/

# We might rename the last two vectors more descriptively
mu <- 
  mu %>%
  rename("weight" = key,
         "height" = value)

# Finally, we can use algebra to convert the weight values to the original ones we wanted
mu <-
  mu %>%
  mutate(weight = weight + 24)

Enough data processing. Here we reproduce McElreath's Figure 4.7.a.

d2 %>%
  ggplot(aes(x = weight, y = height)) +
  geom_point(data = mu %>% filter(Iter < 101),
             color = "navyblue", alpha = .1)

# or prettied up a bit
d2 %>%
  ggplot(aes(x = weight, y = height)) +
  geom_point(data = mu %>% filter(Iter < 101), 
             color = "navyblue", alpha = .05) +
  theme(text = element_text(family = "Times"),
        panel.grid = element_blank())

With fitted(), it's quite easy to plot a regression line and its intervals. Just omit the summary = T argument.

muSummary <-
  fitted(b4.3, 
         newdata = weight.seq) %>%
  as_tibble() %>%
  bind_cols(weight.seq)

head(muSummary)

Here it is, our analogue to Figure 4.7.b:

d2 %>%
  ggplot(aes(x = weight, y = height)) +
  geom_ribbon(data = muSummary, 
              aes(y = Estimate, ymin = `Q2.5`, ymax = `Q97.5`),
              fill = "grey70") +
  geom_line(data = muSummary, aes(y = Estimate)) +
  geom_point(color = "navyblue", shape = 1, size = 1.5, alpha = 2/3) +
  theme(text = element_text(family = "Times"),
        panel.grid = element_blank())

And if you wanted to use intervals other than the defaullt 95% ones, you'd enter a probs argument like this: fitted(b4.3, newdata = weight.seq, probs = c(.25, .75)).

4.4.3.5. Prediction intervals.

Much as brms::fitted() was our anologue to rethinking::link(), brms::predict() is our anologue to rethinking::sim().

We can reuse our weight.seq data from before. But in case you forgot, here's that code again.

weight.seq <- tibble(weight = seq(from = 25, to = 70, by = 1))

Our predict() code.

pred.height <-
  predict(b4.3,
          newdata = weight.seq) %>%
  as_tibble() %>%
  bind_cols(weight.seq)

pred.height %>% slice(1:6)

This time the summary information in our data frame is for, as McElreath puts is, "simulated heights, not distributions of plausible average height, $\mu$" (p. 108). Another way of saying that is that these simulations are the joint consequence of both $\mu$ and $\sigma$, unlike the results of fitted(), which only reflect $\mu$. Our plot for Figure 4.8:

d2 %>%
  ggplot(aes(x = weight, y = height)) +
  geom_ribbon(data = pred.height, 
              aes(y = Estimate, ymin = `2.5%ile`, ymax = `97.5%ile`),
              fill = "grey83") +
  geom_ribbon(data = muSummary, 
              aes(y = Estimate, ymin = `2.5%ile`, ymax = `97.5%ile`),
              fill = "grey70") +
  geom_line(data = muSummary, aes(y = Estimate)) +
  geom_point(color = "navyblue", shape = 1, size = 1.5, alpha = 2/3) +
  theme(text = element_text(family = "Times"),
        panel.grid = element_blank())

4.5. Polynomial regression

Remember d?

d %>%
  glimpse()

We standardize our weight variable like so.

d <-
  d %>%
  mutate(weight.s = (weight - mean(weight))/sd(weight))

Here's the quadratic model in brms.

b4.5 <- 
  brm(data = d, family = gaussian,
      height ~ 1 + weight.s + I(weight.s^2),
      prior = c(set_prior("normal(178, 20)", class = "Intercept"),
                set_prior("normal(0, 10)", class = "b"),
                set_prior("cauchy(0, 1)", class = "sigma")),
      chains = 4, iter = 2000, warmup = 1000, cores = 4)

plot(b4.5)
print(b4.5)

Here's the legwork for our quadratic plot with fitted() and predict().

weight.seq <- data.frame(weight.s = seq(from = -2.2, to = 2, length.out = 30))

fitdquad <-
  fitted(b4.5, 
         newdata = weight.seq) %>%
  as_tibble() %>%
  bind_cols(weight.seq)

predquad <-
  predict(b4.5, 
          newdata = weight.seq) %>%
  as_tibble() %>%
  bind_cols(weight.seq)  

The code for our version of Figure 4.9.a. You'll notice how little the code changed from that for Figure 4.8, above.

ggplot(data = d, 
       aes(x = weight.s, y = height)) +
  geom_ribbon(data = predquad, 
              aes(y = Estimate, ymin = `2.5%ile`, ymax = `97.5%ile`),
              fill = "grey83") +
  geom_ribbon(data = fitdquad, 
              aes(y = Estimate, ymin = `2.5%ile`, ymax = `97.5%ile`),
              fill = "grey70") +
  geom_line(data = fitdquad, 
            aes(y = Estimate), size = 1/4) +
  geom_point(color = "navyblue", shape = 1, size = 1.5, alpha = 1/3) +
  theme(text = element_text(family = "Times"),
        panel.grid = element_blank())

The cubic model:

b4.6 <- 
  brm(data = d, family = gaussian,
      height ~ 1 + weight.s + I(weight.s^2) + I(weight.s^3),
      prior = c(set_prior("normal(178, 100)", class = "Intercept"),
                set_prior("normal(0, 10)", class = "b"),
                set_prior("cauchy(0, 1)", class = "sigma")),
      chains = 4, iter = 2000, warmup = 1000, cores = 4)

The good old linear model:

b4.7 <- 
  brm(data = d, family = gaussian,
      height ~ 1 + weight.s,
      prior = c(set_prior("normal(178, 100)", class = "Intercept"),
                set_prior("normal(0, 10)", class = "b"),
                set_prior("cauchy(0, 1)", class = "sigma")),
      chains = 4, iter = 2000, warmup = 1000, cores = 4)

Here's the fitted(), predict(), and ggplot2 code for Figure 4.9.c., the cubic model.

fitd_cub <-
  fitted(b4.6, 
         newdata = weight.seq) %>%
  as_tibble() %>%
  bind_cols(weight.seq)

pred_cub <-
  predict(b4.6, 
          newdata = weight.seq) %>%
  as_tibble() %>%
  bind_cols(weight.seq) 

ggplot(data = d, 
       aes(x = weight.s, y = height)) +
  geom_ribbon(data = pred_cub, 
              aes(y = Estimate, ymin = `2.5%ile`, ymax = `97.5%ile`),
              fill = "grey83") +
  geom_ribbon(data = fitd_cub, 
              aes(y = Estimate, ymin = `2.5%ile`, ymax = `97.5%ile`),
              fill = "grey70") +
  geom_line(data = fitd_cub, 
            aes(y = Estimate), 
            size = 1/4) +
  geom_point(color = "navyblue", shape = 1, size = 1.5, alpha = 1/3) +
  theme(text = element_text(family = "Times"),
        panel.grid = element_blank())

And here's the fitted(), predict(), and ggplot2 code for Figure 4.9.a., the linear model.

fitd_line <-
  fitted(b4.7, 
         newdata = weight.seq) %>%
  as_tibble() %>%
  bind_cols(weight.seq)

pred_line <-
  predict(b4.7, 
          newdata = weight.seq) %>%
  as_tibble() %>%
  bind_cols(weight.seq) 

ggplot(data = d, 
       aes(x = weight.s, y = height)) +
  geom_ribbon(data = pred_line, 
              aes(y = Estimate, ymin = `2.5%ile`, ymax = `97.5%ile`),
              fill = "grey83") +
  geom_ribbon(data = fitd_line, 
              aes(y = Estimate, ymin = `2.5%ile`, ymax = `97.5%ile`),
              fill = "grey70") +
  geom_line(data = fitd_line, 
            aes(y = Estimate), size = 1/4) +
  geom_point(color = "navyblue", shape = 1, size = 1.5, alpha = 1/3) +
  theme(text = element_text(family = "Times"),
        panel.grid = element_blank())
Overthinking: Converting back to natural scale.

You can apply McElreath's conversion trick within the ggplot2 environment, too. Here it is with the cubic model.

at <- c(-2, -1, 0, 1, 2)

ggplot(data = d, 
       aes(x = weight.s, y = height)) +
  geom_ribbon(data = pred_line, 
              aes(y = Estimate, ymin = `2.5%ile`, ymax = `97.5%ile`),
              fill = "grey83") +
  geom_ribbon(data = fitd_line, 
              aes(y = Estimate, ymin = `2.5%ile`, ymax = `97.5%ile`),
              fill = "grey70") +
  geom_line(data = fitd_line, 
            aes(y = Estimate), size = 1/4) +
  geom_point(color = "navyblue", shape = 1, size = 1.5, alpha = 1/3) +
  theme(text = element_text(family = "Times"),
        panel.grid = element_blank()) +
  # Here it is!
  scale_x_continuous(breaks = at,
                     labels = round(at*sd(d$weight) + mean(d$weight), 1))

Note. The analyses in this document were done with:

Reference

McElreath, R. (2016). Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman & Hall/CRC Press.

rm(pos, growth, big, small, d, d2, sample_mu, sample_sigma, b4.1, b4.1halfCaucy, b4.2, post, b4.3, b4.4, N, dN, post10, post50, post150, post352, b10, b50, b150, b352, p10, p150, p50, p352, multiplot, mu_at_50, weight.seq, mu, muSummary, pred.height, b4.5, b4.6, b4.7, fitdquad, predquad, fitd_cub, pred_cub, fitd_line, pred_line, at)


joepowers16/rethinking documentation built on June 2, 2019, 6:52 p.m.